Compression Tutorial
Written by Jay

* Sorry about the screwed up diagrams. The doc version wasn't like this.

Table of Contents
-------------------

1)	Introduction
2)	Dictionary Method
3)	Fixed Bit Length Packing
4)	RLE Encoding and Variants
5)	LZ77 and Derivatives
6)	Huffman Encoding
7)	LZW Encoding
8) 	FAQ's

1)	Introduction
----------------------

This is a compression tutorial. This purpose of this tutorial is not to show you how 
to crack compression, rather it is intended to show you how compression methods 
work, and can be adapted to programs that will reduce file size. I personally do not 
like the term "compression", as it is possible that "compression" will generate a 
larger file size (rarely, unless you compress a already packed file). I will use the 
term "encoding" as I believe it is a much better term.

2)	Dictionary method
----------------------------

The dictionary method is very simple. It uses a look up table of commonly used 
words (in the computer's case we'd be talking about a string of bytes), and has a 
special code that points to the word. eg.

Dictionary:	00: the
		01: dogs
		02: steak
		03: and

Input Stream:
	the dog and I eat steak and onion on the log
Encoded Stream:
	00 01 03 I eat 02 03 onion on 01 log

On decoding, the decoder must also know the initial entries of the dictionary. 
Usually, each of the encoded bytes are prefixed with another byte to indicate that 
the next byte is in fact a dictionary entry. This method is very efficient on streams 
that have word that repeats itself a lot. This method is also somewhat similar to the 
LZW algorithm.

3)	Fixed Bit Length Packing
-----------------------------------

Fixed Bit Length encoding is the encoding of a stream in less bits than that is 
required. As we all know that 1 byte ranges from (0 - 255) and is 8-bits long. 
Suppose we have a input stream which contains only bytes that are less than 16. 
Rather than storing it as 8-bits per code, we can shrink it to become 4-bits per code 
as 4-bits range from (0 - 15). For example.
Input Stream:
	01 12 04 07 08 08 15
If we convert each code to binary:
00000001 00001100 00000100 00000111 00001000 00001000 00001111
we see that the 4 most significant bits (4-left most bits) are unused. Therefore we 
can pack each one using 4 bits.
0001 1100 0100 0111 1000 1000 1111
(as you can see, the 4 most significant bits are truncated)
Then we pack all the bits into bytes
00011100 01000111 10001000 11110000  (the last code must be padded with 0's
						    to make up a byte)
and convert back to decimal:
28 71 136 240
and the file is shrunk. So how do we figure out how many bits we should pack given 
a range of data? The table below will tell you:

----------------------------------------------------------------
| Minimum Bits to Encode	|	Range of Data	|
----------------------------------------------------------------
|		1		|	0	-	1	|
|		2		|	0	-	3	|
|		3		|	0	-	7	|
|		4		|	0	-	15	|
|		5		|	0	-	31	|
|		6		|	0	-	63	|
|		7		|	0	-	127	|
|		8		|	0	-	255	|
|		9		|	0	-	511	|
|		10		|	0	-	1023	|
|		11		|	0	-	2047	|
|		12		|	0	-	4095	|
|		13		|	0	-	8191	|
|		14		|	0	-	16383	|
|		15		|	0	-	32767	|
|		16		|	0	-	65535	|
----------------------------------------------------------------

If you have noticed, if you punch in n 1's into the calculator in binary mode, and 
convert to decimal, then you can figure out the range for n bits. Some times your 
range of data may fall below a certain range (eg. 16), and one rare byte happens to 
be a large number such as 255. This would force your code to be 8-bits and thus 
not packing the data at all. There is such a method of a "Variable Bit Length 
Encoding", and that is known as Huffman Encoding.
Some games that use a lot of kanji sometimes rather than using a prefix byte to 
print to the kanji character (as that takes up 2 bytes and would waste a lot of room 
due to the amount to kanji in the game) may decided to pack the data as 12-bits or 
less to conserve space.

4)	RLE Encoding and Variants
--------------------------------------

RLE is one of the simplest encoding methods known to man. You could probably 
make this one up. RLE has many variants, and in fact, I have rarely seen any 2 RLE 
methods used in programs that are the same. RLE stands for Run-Length Encoding, 
and is a specialized encoding method for repetitive data. It is a "specialized" 
encoding method because it can only be used on repetitive data or else it will 
generate a much larger file size. In this document, I will explain a few simple 
variants of RLE encoding.

Variant 1:

Suppose we have an input stream containing:

A A A A A A B B B B B B B D D D D D D C C

Such a repetitive stream can easily be encoding as:

6 A 7 B 6 D 2 C

As you can see, each character is preceded with a count. There are 6 A's, 7 B's, 6 
D's, and 2 C's. Thus this make packing of the stream much smaller than the original 
size. Suppose we have a input stream containing:

A B D C D A B

The output stream will contain:

1 A 1 B 1 D 1 C 1 D 1 A 1 B

which will as you can see produce a larger file size than you want. This is why RLE 
is not very effective on non repetitive streams. However, to help reduce this 
horrendous expansion of size, we go to the next variant of RLE coding.

Variant 2:

Some RLE encoders will actually search through the entire file and find the least 
used byte in the stream. The least used byte becomes a special byte that indicates 
that a length and code character follows. For example:

A A A A A A A A A A A B N B B B

In this case the least used character isn't in the stream (because the 24 other 
characters in the alphabet isn't even used at all), but for demonstration purposes, 
the least used character is N. The encoder will generate:

N N 11 A B N 1 N N 3 B

As you can see, the first character in the stream (highlighted in bold) tell us what is 
the special character. Every time the stream encounters a N, the next byte read is 
the repeat count, followed by the character. This is much more efficient than 
variant 1. But only repeat count of more than 3 bytes will benefit.

Variant 3:

In variant 3 of RLE, the input stream is grouped into block of bytes of up to 127 
bytes. Each block is prefixed with a special byte. If the special byte has bit 7 set, 
then the remaining 7 bits will be the number of times the next character is 
duplicated. If bit 7 is clear, then the remaining 7 bit will represent how many bytes 
are not compressed and thus the decoder should output those bytes as is. For 
example:

A A A A A A B D G S H D D D

will be encoded as:

134 A 05 B D G S H 131 D

The bold numbers (if you're reading the *.doc version) is the special byte. To 
decoded the above stream, we first get a byte (134). Then we test bit 7 of the byte. 
If we convert 134 to binary we'll get 10000110. We see that bit 7 is set (bit 7 is the 
left most bit), that means we take the remaining 7 bits 0000110 and convert that 
to decimal (6). The 6 is how many times the next byte (A) is duplicated. In the 
example, A is duplicated 6 times. The next byte got is (05). Convert it to binary: 
00000101, and bit 7 is not set. That means the remaining 7 bits (5) is how many 
bytes not packed. So the next 5 bytes are outputted as is. This way, when a stream 
with non-repetitive bytes occur, the larger file output is minimal.

Variant 4:

Much like variant 3, but each byte is a literal byte unless it's value is greater than 
192. If the value of the literal byte is greater than 192, then the repeat count is the 
value - 192, followed by the next byte to be repeated. eg.

A A A A A B B D C

we can encode it as

197 A 194 B D C

This is useful for text files, as ascii characters will never reach the value 192.

Conclusion:

RLE has so many variants, that it is impossible to describe them all. I have only 
describe 4 Variants of the many out there. You will know that you have RLE 
encoding if you notice that repetitive bytes are being encoded.

5)	LZ77 and Derivatives
--------------------------------

The LZ77 algorithm was discovered by Lempel-Ziv in 1977. The LZ77 algorithm has 
many derivatives (this including LZSS). They are all so similar that they are pretty 
much the same thing. Anyways, the LZ77 algorithm consists of a sliding window 
and a read ahead buffer. The sliding windows is basically a history buffer that keeps 
track of the last n bytes of the input stream. A sliding window can range from 
between 0k to 64k. LZSS uses a sliding window of 4k which is what most use. The 
read ahead buffer is opposite of the sliding window. It keeps track of n bytes ahead 
of the stream. The size of the read ahead buffer is usually between 0 - 258. The 
algorithm is basically this. Fill the read ahead buffer with the next n bytes (where n 
is the size of the read ahead buffer). Search the sliding window for the best match 
of the data in the read ahead buffer. If the match length is greater than the 
minimum match length (which is determined by the encoder usually by the size of 
the sliding window. For a 4k sliding window, the minimum match length is 2), then 
output a <length, distance> pair. Where length is then length of the match, and 
distance is how many bytes backward into the input stream in which the match is 
found. To clear this up, lets look an example:
(we will using a sliding window of 10 bytes, and a read ahead buffer of 5 bytes)

Input Stream:
	A A A A A A A A A A A B A B A A A A A
	--------------------- =========
	Sliding Window	    Read Ahead
				    Buffer

Suppose the sliding window contains 10 A (in the example above), which was the 
last 10 bytes read. The read ahead buffer contains B A B A A. The first step to 
encoding is to search the sliding windows with the read ahead buffer for a match 
that is longer than 2. B A B A A is not found in the sliding window, so B is outputted 
to the stream as a literal byte. The sliding window slides over one byte.

Current Output Stream: B

Input Stream:
	A A A A A A A A A A A B A B A A A A A
	  --------------------- =========
	  Sliding Window	      Read Ahead
				      Buffer

Now the read ahead buffer contains A B A A A. It is then compared with the sliding 
window. As it seems, A B is found in the sliding window with a match length of 2 
and is 2 bytes backwards from the current byte. Therefore a <length, distance> 
pair is outputted. Since the length is 2 and the backwards distance is 2, the output 
is <2,2>. Next the sliding window slides over 2 bytes.

Current Output Stream: B <2,2>

Input Stream:
	A A A A A A A A A A A B A B A A A A A
	      --------------------- =========
	      Sliding Window	    Read Ahead
				          Buffer

The read ahead buffer now contains A A A A A. What luck! There is a match of 
length 5 in the sliding window with a backwards distance of 8. The output will be 
<5,8>.

Final Output Stream: B <2,2> <5,8>

Pretty simple, eh? Now let's decode the stream. Suppose the last 10 bytes decoded 
were 10 A's.

Input Stream:
A A A A A A A A A A A B <2,2> <5,8>

The read ahead buffer and sliding window is not required during decoding. The first 
10 A's are literal bytes, so they are decoding as is.

Current Output Stream: A A A A A A A A A A

The next byte B, is literal also, so it is decoded as is.

Current Output Stream: A A A A A A A A A A B

Next, a <length, distance> pair is encountered: <2, 2>. This means go back 2 
bytes and copy the 2 bytes there to the output stream.

Current Output Stream:
A A A A A A A A A A B
				<--
				2 bytes back
	A A A A A A A A A A B A B
				---
				 |->
				copied to output

Next is a <5,8>, meaning go 8-bytes back, and copy 5 bytes to the output.

Current Output Stream:
	A A A A A A A A A A B A B
		    <--------------
			8 bytes back
	A A A A A A A A A A B A B A A A A A
		    ---------
			  |---------->
  copied to output

Now there is one problem. How is one to determine which bytes are literal and 
which bytes are <length, distance>? The solution is to group the data in blocks of 8 
with a prefix byte. The prefix byte when converted to binary, will flag on and off 
whether the next 8 bytes is a literal or a <length, distance> pair. For eg. if the 
encoded stream (in the example above) was:
B <2,2> <5,8>
Which is one literal and 2 distance. The prefix byte would be 011 (0 for literal, 1 for 
<length, distance>). Then pad the remaining 5 bits with 0 to make up a byte: 
01100000 -> 96 (decimal). The final stream would be:
	96 B 2 2 5 8

The LZ77 described here is just the main idea of the algorithm. Some encoders will 
actually output a <distance, length> rather than a <length, distance> pair, and 
may have different ways of prefixing the data. So don't depend too much on 
everything I say. And when LZ77 is combined with Huffman coding, you get a 
powerful compression tool, and this is one of the method in the "zip" file format. 
See "deflate" specs for more info.  

6)	Huffman Encoding
----------------------------

This method was discovered by Huffman, D. A. in 1952 (I think). This algorithm 
could be said to be a Variable Bit Length Encoding as each code is variable bits in 
length. The general idea of Huffman coding is to keep count of the number of times 
a character appears in the input stream. The count is then constructed into a binary 
tree using the Huffman algorithm to generate an optimized bit length coding. There 
is, however, a problem with variable bit length encoding. Suppose our input stream 
consisted of {0, 1, 2, 3, 4, 5}, we can pack each byte respectively as 0, 01, 10, 11, 
100, 101. However, if we saw a binary sequence of 0011011, we aren't able to 
decode this into a valid stream for 0011011 can represent 0 1 2 3 or it can also 
represent 0 0 3 0 3. However, if each were coded as 0, 101, 1100, 1101, 1110, 
1111 a binary sequence of 010111001101 will only produce 0 1 2 3. The Huffman 
algorithm takes care of this problem. I will now explain the Huffman coding with a 
series of (ascii) diagrams as it is hard to describe in text. Let take an input stream:

A B E D E C E A E D C B E B A D E D E

We shall first count the number of times each character appears in the stream. A is 
found 3 times, B 3 times, C 2 times, D 4 times, and E 7 times. I will therefore 
represent these counts will a (character, count) pair. So the first step looks like 
this:

(A, 3) (B, 3) (C, 2) (D, 4) (E, 7)

The basic idea of Huffman coding is to re-code the character with the maximum 
count with the least bits. But first, we must sort the list by ascending counts. So 
the sorted list is:

(C, 2) (A, 3) (B, 3) (D, 4) (E, 7)

The first 2 nodes { (C, 2) and (A, 3) } are joined into a fictive node in which I will 
call F1. F1, will have a count of the count of A + C:

	(F1, 5) (B, 3) (D, 4) (E, 7)
	   /\
	  /  \
	 /    \
   (C, 2)  (A, 3)

Next the list is sorted once again.

(B, 3) (D, 4) (F1, 5) (E, 7)
   	         /\
	  	/  \
	       /    \
            (C, 2)  (A, 3)

The first 2 nodes is therefore joined into another fictive node (F2):

	(F2, 7)		(F1, 5)		(E, 7)
	   /\		   /\
	  /  \		  /  \
         /    \		 /    \
     (B, 3) (D, 4    (C, 2) (A, 3)

List Sorted:

        (F1, 5)		(F2, 7)		(E, 7)
	   /\		   /\
	  /  \		  /  \
         /    \		 /    \
     (C, 2) (A, 3)   (B, 3) (D, 4)

Nodes Joined in (F3):

		   (F3, 12)				(E, 7)
	              /\
		     /  \
		    /    \
		   /      \
	    (F1, 5)      (F2, 7)
	     /\		     /\
	    /  \	    /  \
           /    \	   /    \
    	 (C, 2) (A, 3)(B, 3) (D, 4)

Sort again:
(E, 7)	     	              (F3, 12)
				 /\
		                /  \
		               /    \
		              /      \
                         (F1, 5)  (F2, 7)
	   	            /\        /\
	  	           /  \	     /  \
                          /    \    /    \
    	               (C, 2) (A, 3)(B, 3) (D, 4)

Final Nodes joined in (F4):
		     (F4, 19)
		        /\
		       /  \
		      /    \
		     /	    \
		  (E, 7)   (F3, 12)
			      /\
		             /  \
		            /    \
		           /      \
                       (F1, 5)  (F2, 7)
	   	         /\        /\
	  	        /  \      /  \
                       /    \    /    \
    	             (C, 2) (A, 3)(B, 3) (D, 4)

Now if we view the final tree:


			<- 0           1 ->
				 /\
			      0 /  \1
			       /    \
			      /	     \
	 		   (E,7)     /\
			           0/  \1
			           /    \
		        	  /      \
	   		         /\      /\
		  	       0/  \1  0/  \1
        	               /    \  /    \
 	   	             (C,2)(A,3)(B,3)(D,4)

You can see if we traverse right, the code will be 1, but left it'll be zero. Here is the 
table of resulting codes:
--------------------------------
|Character	|Code		|
--------------------------------
|	A	| 101		|
|	B	| 110		|
|	C	| 100		|
|	D	| 111		|
|	E	| 0		|
--------------------------------

As you can see, E having the most character count can be encoded with 1 bit while 
everything else with 3 bits. Now the encoded form is:
Input:
A B E D E C E A E D C B E B A D E D E

Output:
101 110 0 111 0 100 0 101 0 111 100 110 0 110 101 111 0 111 0

The output is then packaged into bytes.
10111001 11010001 01011110 01100110 10111101 11000000 (last byte padded)
    185           209          94            102         189           192

So our 19-byte input stream is encoded to 6 bytes. There is a problem with 
decoding the stream with the current output as above. The decoder is unable to tell 
which bits represent which character. Also, due to the fact that the last byte is 
padded, with E which has the bit code of 0, the decoder will output 5 extra E's. To 
solve the first problem, a header must be placed in front of the code. The header 
will tell the decoder on which bits will represent which code. To solve the second 
problem, a EOF byte (usually 256) is also added in with the encoding stream with a 
count of 1. When ever the decoder decodes the byte 256, decoding must cease. 
There are many different ways to implement the header. The first way (the least 
efficient way) is for each 256 bytes, output the length of the code, then output the 
code itself. Basically, the algorithm to read the header is:
	for i = 0 to 256 times
		length = get one byte
		code = read length bits
		huffmancode[i] = code;
The decoder then must build a Huffman Tree representation for decoding. The 
problem with this header is that it's large, and will occupy more than 256 bytes. 
This make Huffman coding inefficient with small files. There are many clever ways 
to shrink down the size of the header, and I will let you figure that yourself. The 
2nd way of implementing the header requires you to use consecutive Huffman 
coding.

Consecutive Huffman Coding

Consecutive Huffman Coding (actually I made up this term because I didn't know 
what to call it) is a efficient way of encoding the stream for a minimal header 
construction. It is no different than regular Huffman coding with the exception that 
each code is coded in such a way that in order of a known alphabet, each code is 
consecutively increasing and where shorter length precede longer lengths. We build 
the tree as usual, and do all the above stuff. Then instead of using the codes 
created by the tree, we re-code each code so that it is consecutively increasing. 
Using the above example:
------------------------------------------------
|Character	|Code		|Length	|
------------------------------------------------
|	A	| 101		|	3	|
|	B	| 110		|	3	|
|	C	| 100		|	3	|
|	D	| 111		|	3	|
|	E	| 0		|	1	|
------------------------------------------------
We must re-code it to become:
------------------------------------------------
|Character	|Code		|Length	|
------------------------------------------------
|	A	| 100		|	3	|
|	B	| 101		|	3	|
|	C	| 110		|	3	|
|	D	| 111		|	3	|
|	E	| 0		|	1	|
------------------------------------------------
As we can see, E (being the shortest length) precedes A, which precedes B, which 
precedes C, which precedes D. Using this technique, we can represent the output 
header with only the lengths of the code. Our header would be 3 3 3 3 1. But given 
a header with only the lengths, how are we to reconstruct the binary codes you 
ask? First you will need to keep track of how many lengths occur in the stream. For 
example, we will be using the Huffman Header 3 3 3 3 1. We see that there are 4 
counts of length 3 and 1 count of length 1. Given this information, we can build the 
bit table by this algorithm:

code = 0
for i = 1 to maximum_bit_length times (maximum_bit_length is usually 256)
	for j = 0 to the number of count bit length i occurs
		next_huffmancode_with_the_length_of_i = code
		code = code + 1
	end
	code = code x 2
end

To build from the header 3 3 3 3 1, we must set code to 0.
Next, for the number of count bit length of 1 occurs, (since 1 occurs once), the first 
code is set to be code.
------------------------------------------------
|Character	|Code		|Length	|
------------------------------------------------
|	E	|	0	|	1	|
------------------------------------------------
code is incremented by 1. Since there is only one count of length 1, then code is 
multiplied by 2. code now equals 10. Since there are no occurrences of length 2, 
code is again multiplied by 2 to get the code 100. There are 4 occurrences of length 
3. First code of length 3 is assigned code.
------------------------------------------------
|Character	|Code		|Length	|
------------------------------------------------
|	E	|0		|	1	|
|	A	|100		|	3	|
------------------------------------------------
code is incremented by 1 to become 101, so the next length 3 code is assigned 
that.
------------------------------------------------
|Character	|Code		|Length	|
------------------------------------------------
|	E	|0		|	1	|
|	A	|100		|	3	|
|	B	|101		|	3	|
------------------------------------------------
code incremented, next code assigned.
------------------------------------------------
|Character	|Code		|Length	|
------------------------------------------------
|	E	|0		|	1	|
|	A	|100		|	3	|
|	B	|101		|	3	|
|	C	|110		|	3	|
------------------------------------------------
and again
------------------------------------------------
|Character	|Code		|Length	|
------------------------------------------------
|	E	|0		|	1	|
|	A	|100		|	3	|
|	B	|101		|	3	|
|	C	|110		|	3	|
|	D	|111		|	3	|
------------------------------------------------
And the Huffman codes are regenerated.

7)	LZW Encoding
------------------------

The LZW encoding was created by Lempel-Ziv Welch. This algorithm take huge 
advantage over repetition data. LZW is a dictionary encoding method. It consist of a 
0 to 64k dictionary. Normally a 4k dictionary is used. First thing to LZW encoding, is 
that the dictionary must start off containing every possible byte in the input 
stream. For example, a input stream with containing A B C D will have an initial 
dictionary entry of:

0: A
1: B
2: C
3: D

The dictionary should also contain a clear code and an EOF code. So we make the 
next 2 entries as the clear code and EOF code.

0: A
1: B
2: C
3: D
4: Clear Code
5: EOF

So what is the Clear Code anyways? Well, since our dictionary size is limited to 4k, 
it will get filled up. The encoder may decided to clear the contents of the dictionary 
thus outputting the clear code. The EOF code, when encountered by the decoder, 
should terminate the stream.
The encoding of the data will consist of a prefix code, which I will call P and the 
suffix, which I will call S. First, P is set to NULL. The first byte is grabbed from the 
input stream and stored into S. Then encoder then searches the dictionary for the 
string PS (PS is the prefix added on with the suffix, eg. if P was "Hell" and S was 
"o", then PS would be "Hello"). If PS is in the dictionary, which it is (the first byte is 
always in the dictionary), then P is assigned the string PS and the process starts 
over. However, if PS is not in the dictionary, then PS is added as a new entry into 
the dictionary and P is sent into the output stream. Then P is assigned with S. In 
general, the algorithm goes like this:

1: Initialize the Dictionary to all possible values in the input stream
2: Add Clear Code and EOF to the Dictionary
3: P = NULL
4: S = next byte
5: Search string PS in the dictionary
    if found
6:	P = PS
7:	go to step 4
    otherwise if not found
8:	output string P
9:	P = S
10:	go to step 4

I betcha you're dying for an example. OK here goes. Suppose we have an input 
stream:

A B C A B C A

First, initialize the dictionary.
_________
Dictionary:
0: A
1: B
2: C

Then add CC (clear code) and EOF
_________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF

Now set P = NULL. Now our statistics
__________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF

P = 	S = ?	PS = ?
Input: A B C A B C A
Output: Nothing yet

First we grab a byte from the input stream, A and store it to S. Then we search the 
dictionary for PS (PS = "A"). PS is in fact in the dictionary.
___________________________
Dictionary:
0: A	<------------- PS is found
1: B
2: C
3: CC
4: EOF

P = 	S = A	PS = A
Input: A B C A B C A
Output: Nothing yet

Then P = PS, since it was found. Next S is stored with the next byte, B.
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF

P = A	S = B	PS = AB
Input: A B C A B C A
Output: Nothing yet

Now PS ("AB") is searched in the dictionary. "AB" is not found, so "AB" is added to 
the dictionary:
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB	<--------- Added to dictionary

P = A	S = B	PS = AB
Input: A B C A B C A
Output: Nothing yet

Next we output the code P:
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB

P = A	S = B	PS = AB
Input: A B C A B C A
Output: 0	<--------- "A" is code 0

and P = S.
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB

P = B	S = B	PS = BB
Input: A B C A B C A
Output: 0

Now S is taken with the next byte:
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB

P = B	S = C	PS = BC
Input: A B C A B C A
Output: 0

PS ("BC") is therefore not in the dictionary, so it is added to the dictionary. P 
outputted as 1, the next code is grabbed. Next code is "A", and "CA" is searched in 
the dictionary. "CA" is not in the dictionary and thus added to it. The code for C is 
outputted, and P is assigned with S ("A"). The next byte is put into S ("B").
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB
6: BC
7: CA

P = A	S = B	PS = AB
Input: A B C A B C A
Output: 0 1 2

As us can see, this time "AB" is in the dictionary. Thus, P = PS.
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB
6: BC
7: CA

P = AB	 S = B	PS = ABB
Input: A B C A B C A
Output: 0 1 2

Next byte is stored into S
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB
6: BC
7: CA

P = AB	 S = C	PS = ABC
Input: A B C A B C A
Output: 0 1 2

PS ("ABC") is not in the dictionary so "ABC" is added as the next string, and the 
code for P ("AB", 5) is outputted.
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB
6: BC
7: CA
8: ABC

P = AB	 S = C	PS = ABC
Input: A B C A B C A
Output: 0 1 2 5

Now, P = S, and the next string is "CA". Since CA is the last entries in the 
dictionary, the code 7 is outputted for "CA" followed by the EOF code of 4. The final 
output is:

0 1 2 5 7 4

Oops, did I forget to tell you that each of these codes are packed in a fixed bit 
length encoding? Why since the dictionary starts off with 5 entries, we can packed 
each byte with 3 bits. But what about the 8th entry? The rare 8 will force our code 
size to be 4 bits. Well guess what? Since we add entries to the dictionary in 
numerical order (0,1,2,3,4,...) we can boost the bit length as we go along. For 
example, when we first output the code 8, all codes following that will be boosted to 
4 bits long. When we hit code 16, the following with be boosted 5 bits long. 
Therefore, the final output will be:
000 001 010 101 0111 0100
(the 8th code was added when the code 7 was outputted so the length is boosted 
by one)
and package them into byte
00000101 01010111 01000000
     5		 87	      64
Thus, making the code short. The decoding process is slightly different. First we 
must reinitialize the dictionary with all the possible bytes in the stream. Wait, how 
is the dictionary supposed to know what bytes were originally in the stream? There 
should be a header, right? WRONG! The beauty of LZW is that a header is not 
required. Given a regular file, the initial stream with contain bytes from (0 - 255) so 
the initial entry in the dictionary will be 0 - 255. But for example sake, I will create 
my own initial dictionary. The decoding algorithm:

1: Initialize Dictionary with the CC and EOF codes included.
2: S = next byte
3: output S
4: P = S
5: S = next byte
    if S is in the dictionary
6:	if S = CC go to step 1
7:	if S = EOF terminate
8:	output S
9:	S = First character in string S
10:	Add PS to the dictionary
11:	go to step 4
   otherwise if S is not in the dictionary
12:	S = first character in string P
13:	Output PS
14:	Add PS to the dictionary
15:	go to step 4

Let decode the encoded stream from the example above.
0 1 2 5 7 4

Initialize dictionary first:
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF

We grab the first character to S, and output it.
Output: A
Then P = S and S is assigned the next code, 1. The decoder searches the dictionary 
for S and it is found. "B" is decoded, and S is assigned the first character in itself. 
Since S has only one character, the first character is B. PS ("AB") is added to the 
dictionary.
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB

Output: A B

Next P = S, and code (2) is got. S (2) in the dictionary is outputted, and the first 
character of S is thus C. PS ("BC") is added. 
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB
6: BC

Output: A B C

5 is next, the process repeats.
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB
6: BC
7: CA

Output: A B C A B

Next code is 7, then:
___________________
Dictionary:
0: A
1: B
2: C
3: CC
4: EOF
5: AB
6: BC
7: ABC

Output: A B C A B C A

When the 4 (EOF) is encountered, decoding ceases.

8) 	FAQ's
--------------

Q: I will definitely find the algorithms given in this document exactly as you stated 
in compression programs today?

A: Nope, but you will find that the idea is pretty much the same.

Q: Which algorithm is the best?

A: Can't really say. Here is my views on each algorithm.

Dictionary Method: Encoding and decoding very fast. Good for data with lots of 
reused strings. OK Ratio.

Fixed Bit Length Encoding: Encoding and decoding very fast. Ratio is always fixed.

RLE: Very fast encoding and very fast decoding. Only useful for repetition data such 
as raster data files. Ratio is OK.

LZ77: Pathetically slow encoding (even slower if you don't use a hash table). Fast 
Decoding. It has pretty good ratio and is great on general data streams. 
When combined with Huffman you get one of the methods in "zip".

Huffman: Moderate encoding and decoding speed. Will make nearly kinds of data 
smaller. Encoding requires 2 passes over the data stream. Good ratio.

LZW: Fast encoding and decoding. Requires only one pass over data and has good 
ratios, especially on repetition data. (3MB file filled with 0 is packed down to 
less than 6k) The biggest drawback is a very non-repetitive stream will 
cause a larger file.

Q: There is a packed font I'm trying to decode. Now that I've decoded it, I've 
encoded my own font. But the size of my font differs from the size of the originally 
encoded font. What should I do?

A: Look for the code in which the font is loaded, and modify it to load a 
uncompressed font. 
